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ABSTRACT 

Shape  models  (SMs),  capturing  the  common  features  of  a  set 
of  training  shapes,  represent  a  new  incoming  object  based 
on  its  projection  onto  the  corresponding  model.  Given  a  set 
of  learned  SMs  representing  different  objects,  and  an  image 
with  a  new  shape,  this  work  introduces  a  joint  classification- 
segmentation  framework  with  a  twofold  goal.  First,  to  auto¬ 
matically  select  the  SM  that  best  represents  the  object,  and 
second,  to  accurately  segment  the  image  taking  into  account 
both  the  image  information  and  the  features  and  variations 
learned  from  the  on-line  selected  model.  A  new  energy  func¬ 
tional  is  introduced  that  simultaneously  accomplishes  both 
goals.  Model  selection  is  performed  based  on  a  shape  simi¬ 
larity  measure,  determining  which  model  to  use  at  each  itera¬ 
tion  of  the  steepest  descent  minimization,  allowing  for  model 
switching  and  adaptation  to  the  data.  High-order  SMs  are 
used  in  order  to  deal  with  very  similar  object  classes  and  natu¬ 
ral  variability  within  them.  The  presentation  of  the  framework 
is  complemented  with  examples  for  the  difficult  task  of  simul¬ 
taneously  classifying  and  segmenting  closely  related  shapes, 
stages  of  human  activities,  in  images  with  severe  occlusions. 

Index  Terms —  Shape  priors,  image  segmentation,  object 
modeling. 

1.  INTRODUCTION 

Object  segmentation  is  one  of  the  most  fundamental  tasks  in 
image  processing,  still  lacking  a  completely  automatic  solu¬ 
tion.  The  main  idea  is  to  find  a  set  of  features  that  describes 
and  discriminates  the  object  of  interest  from  the  rest  of  the 
image.  Pixel  color  is  a  low  level  feature  that  can  be  used  as 
such  descriptor,  although  its  discrimination  capacity  is  often 
insufficient  in  real  images.  Using  shape  as  a  high  level  feature 
is  a  common  approach  to  augment  such  low  level  features. 

The  shape  of  the  desired  object  is  added  as  a  descriptor, 
constraining  the  set  of  possible  solutions  to  regions  of  the  im¬ 
age  that  simulatenously  “match”  this  shape  and  the  low  level 
features  (intensity,  edges,  etc.).  The  most  common  way  to 

*FL  performed  this  work  at  the  University  of  Minnesota.  Work  supported 
by  ONR,  NSF,  NGA,  DARPA,  ARO,  PDTSCOP4618  and  FUNDACIBA- 
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add  this  shape  prior  information  is  in  the  form  of  a  weighted 
linear  combination  of  functionals  concerning,  on  one  hand, 
the  low  level  features  and,  on  the  other  hand,  the  shape  pri¬ 
ors.  This  leads  to  a  minimization  problem  where  the  so¬ 
lution  is  a  compromise  between  the  shape  of  the  final  con¬ 
tour  and  the  information  constrained  by  the  image.  The  ac¬ 
tual  minimization  techniques  vary,  including  gradient  descent 
methods  [1,  2,  3,  4,  5]  and  graph-cuts  [6].  The  used  shape 
representations  also  vary,  including  signed  distance  functions 
(SDF)  [1,  2,  3,  6,  5],  quadratic  splines  [7],  characteristic  func¬ 
tions  [4],  and  landmark  points  [8]. 

When  M  different  objects  can  appear  in  an  image,  a  sin¬ 
gle  shape  prior  (model)  is  not  sufficient,  and  multiple  shape 
priors  must  be  considered.  A  possible,  but  not  elegant,  ap¬ 
proach  is  to  run  the  process  with  each  one  of  the  shape  priors 
separately,  and  then  choose  the  best  solution.  In  [5,  6]  the 
authors  define  M  possible  labels  for  each  pixel  on  the  im¬ 
age,  and  a  segmentation  energy  includes  the  optimization  of 
these  labels  in  order  to  determine  where  to  apply  each  prior. 
In  [7]  the  authors  perform  density  estimation  in  a  non-linear 
feature  space,  where  different  objects  are  separable.  The  pro¬ 
posed  energy  is  then  minimized  considering  both  the  curve’s 
control  points  and  the  image. 

Considering  the  natural  deformations  and  the  variability 
of  objects  in  a  class,  high-order  shape  models  (SMs)  should 
be  included  in  the  segmentation.  Leventon  et  al.  [1]  compute 
PCA  on  a  set  of  registered  shapes  (see  also  Tsai  et  al.  [3]), 
fitting  a  Gaussian  probability  distribution  to  the  coefficients 
of  the  reconstruction.  This  allows  to  compute  the  probabil¬ 
ity  of  a  certain  shape,  included  along  with  geodesic  active 
contours  for  low  level  features,  in  an  MAP  estimation  of  the 
object  in  the  image.  Cootes  and  Taylor  [8]  compute,  using 
PCA,  a  point  distribution  model  of  landmarks  points  defin¬ 
ing  a  shape.  More  recently,  Charpiat  et  al.  [4]  proposed  a 
framework  to  compute  non-linear  shape  statistics  based  on 
the  Hausdorff  distance  between  shapes,  and  then  model  dis¬ 
tributions  similarly  to  [1]. 

In  this  work,  a  new  framework  for  image  segmentation 
with  multiple  high-order  shape  models  is  introduced,  address¬ 
ing  at  the  same  time  the  selection  of  the  model  and  its  image- 
driven  adjustment  to  the  modeled  deformations.  In  particu- 
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Fig.  1.  (a)  Each  point  correspond  to  the  first  three  coordinates  of  the  mapping  obtained  with  diffusion  maps,  [9],  for  the  shapes  of  a  walking 
person.  Clustering  into  five  groups  and  one  sample  shape  from  each  cluster  (walking-cycle  position)  are  shown,  (b)  Four  shapes  from  two 
different  clusters  (one  in  blue  and  one  in  red),  (c)  Original  shape  (in  blue)  and  its  projections  Vkcf)  (in  black)  to  M  =  5  different  models  in 
the  walking  sequence,  one  for  each  cluster.  Note  how  the  projection  is  completely  deformed  when  using  the  wrong  model,  (d)  First  three 
modes  of  variation  for  one  of  the  models  in  the  walking  sequence;  the  thick  black  line  is  the  mean  shape,  the  red  lines  are  obtained  varying 
the  amplitude  (each  figure  is  then  the  addition  of  the  average  and  a  constant  times  the  first,  second,  or  third  eigenmode  respectively). 


lar,  the  high-order  SMs  are  computed  using  PCA  in  a  similar 
way  as  [1,  3],  obtaining  a  set  of  eigenmodes  of  variations. 
The  selection  of  the  model  is  obtained  with  a  binary  selection 
coefficient,  on-line  learned  based  on  a  shape  similarity  mea¬ 
sure  between  shapes.  The  proposed  functional  combines  two 
terms,  the  first  one  is  a  region-based  segmentation  term  [10]. 
The  second  term  is  a  combination  of  the  multiple  high-order 
SMs,  addressing  the  model  selection  and  constraining  the  lo¬ 
cal  features  to  the  high-order  shape  information  coming  from 
the  on-line  selected  model.  While  the  framework  is  presented 
for  planar  curves,  it  can  be  easily  extended  to  data  in  higher 
dimensions. 

2.  HIGH-ORDER  SHAPES  MODELS 

Consider  M  sets  /c  =  1, . . . ,  M,  each  with  Nk  registered 
shapes  =  {0^,  •  •  • ,  where  each  G  is  a  SDF, 

whose  zero  level-set  curve,  represents  a  shape  from  the 
k-th  class  of  objects.  The  shape  deformations  in  the  set 
are  relatively  small  (compared  with  the  deformations  across 
different  classes  k).  Let  Af  ^  be  a  ti-order  model  that  captures 
the  intrinsic  deformations  of  the  training  set  T>/c  for  the  class 
k.  A  model  generates  a  representation  of  a  new  incoming 
shape  0,  whose  accuracy  depends  on  the  similarity  between 
(j)  and  the  shapes  in  In  this  work,  is  derived  from 
a  PCA  decomposition,  and  0’s  representation  is  given  by  the 
(i-projection  =  /^/c  +  where  ^  is  the 

mean  shape  of  (7^  G  is  a  matrix  containing  the  first 

d  modes  of  variation  (eigenmodes),  and  ak  G  are  the  cor¬ 
responding  reconstruction  coefficients,  which  of  course  de¬ 
pend  on  0  (see  [1,  section  2.1]  for  details).  Constraining  small 
shape  variations  in  the  class  allows  to  obtain  accurate  rep¬ 
resentations  using  a  linear  approximation  like  PCA.  If  a  class 
has  large,  non-linear  deformation,  a  set  of  clusters  may  be 
considered,  and  the  deformations  in  each  cluster  are  linearly 
approximated.  Finally,  let  Tl  =  {Aii, . . . ,  be  a  set  of 

SMs  for  the  M  different  classes  of  objects  (for  simplicity,  we 
omit  the  order  d  in  the  notation  from  now  on).  Figure  1  shows 
SMs  for  a  walking  person. 


3.  PROPOSED  FRAMEWORK 


Given  an  input  image  T  :  C  ^  M  containing  one 

or  more  shapes  generated  by  the  shape  models  G  911, 
an  energy  E  is  defined  to  simultaneously  select  the  best 
model(s)  and  obtain  a  segmentation  of  the  corresponding  ob¬ 
jects  in  1  (a  single  object  is  considered  now  for  simplicity), 
(A1*,0*)  :=  argmingjt^0E(J,  0,  co,ci,911).  This  energy 
includes  two  terms  linearly  combined  with  the  constant  A, 
E(X,(/),co,ci,M)  =  Ecy{1, 4>,co,ci)  +  XEsm{4>,'^)- 

Ecy  is  the  energy  introduced  in  [10],  splitting  the  input 
data  into  two  different  regions  of  approximately  piecewise 
constant  values  (cq  and  ci).  The  term  Esm  adds  an  additional 
force  aiming  at  maximizing  the  similarity  between  the  evolv¬ 
ing  shape  0  and  its  projection  onto  only  one  of  the  d-order 
models  from  911.  Which  one  of  the  M  models  is  used  de¬ 
pends  on  the  evolution  of  the  shape  and  its  projection  to  each 
model.  The  proposed  term  is 


M  . 

-Esm  (0:911)  =  Pk  / 

k=l 


\\H{4>{p))-H{V^4>{p))t  (1) 


where  H  is  the  Heaviside  function,  /1/c  is  a  binary  coefficient 
that  (on-line)  selects  which  of  the  M  models  is  used,  and 
is  the  projection  of  0  to  this  model  Only  one  of  the 

(3k  must  be  different  from  zero  in  (1),  since  it  is  not  fair  to 
penalize  for  models  that  do  not  correspond  to  the  object  in 
the  image.  Which  is  the  non-zero  (3k  is  computed  based  on  a 
shape  dissimilarity  measure  (T)  between  two  shapes. 


Jq 


\Mp)\HMp))  \Mp)\S{Mp)) 

length(C2 )  length(Ci ) 


dp.  (2) 


This  is  a  length-normalized  variation  of  the  measure  intro¬ 
duced  in  [11].  This  measure  evaluates  the  sum  of  Euclidean 
distances  corresponding  to  moving  the  contour  of  the  first 
shape  to  points  in  the  contour  of  the  second  shape,  and  vicev- 
ersa,  scaled  by  the  curves  lengths.  In  Figure  1(c),  the  pro¬ 
jected  shapes  are  ordered  according  to  increasing  values  of 
T/,(0)  :=  T(0,7^^0)  (1.35,  2.83,  3.59,  5.87,  and  7.83  re¬ 
spectively). 


Fig.  2.  (a)  Mode  of  variation  for  the  two  ellipses  models  (Ml  in  green  and  Ml  in  red),  the  mean  shape  of  both  models  is  the  same  and  is 
plotted  in  black  dash  line,  (b)  Results  for  experiments  with  Ml  and  Ml  (only  mean  shape).  Some  steps  in  the  segmentation  (see  text)  and 
the  evolution  of  the  shape  dissimilarity  measure  are  also  shown,  (c)  Same  for  experiments  with  Ml  and  M\  (complete  model). 


Based  on  (2),  a  normalized  shape  similarity  measure 
between  a  shape  (j)  and  its  projection  to  the  d- 
order  k-th  model  is  computed  as  where 

C/c(0)  =  exp  (— T/e(0)).  This  normalized  similarity  measure 
is  close  to  one  for  the  model  that  better  represents  (j). 
Finally  to  force  the  binary  value  in  f3k  soft  thresholding,  based 
on  a  sigmoid  function,  is  performed.  Note  that  a  unique  co¬ 
efficient  is  used  as  model  selector,  instead  of  one  coefficient 
in  each  pixel  as  in  [5,  6].  This  encourages  shape  consistency 
and  significantly  simplifies  the  optimization. 

Shape  validation.  With  the  proposed  method,  one  model 
is  always  selected  and  a  segmentation  is  obtained,  even  if  the 
shape  in  the  image  has  no  appropriate  model  that  provides 
a  good  representation.  The  following  measure  permits  to 
discard  a  segmentation  (pf  given  the  model  selected.  First, 
the  mean  and  variance  of  T jik)  are  computed, 
G  Then  if  T(0/,  /i/.)  >  T/.  +  2(7^^,  the  segmenta¬ 
tion  is  discarded,  and  the  shape  can  not  be  recognized. 

Energy  minimization.  The  proposed  energy  is  mini¬ 
mized  using  a  classical  gradient  descent  method.  For  the 
gradient  descent  of  Ecy,  the  expression  is  given  in  [10,  Equa¬ 
tion  (9)].  For  the  obtained  expression  is 

-^  =  -2j2f3k  -  HiV,^)\\ (s{^)  -  , 

^  k=l 

where  W  =  UkU^ •  Although  the  model  selector  fdk  depends 
of  0,  is  treated  as  static,  as  a  first  order  approximation  for  the 
gradient  descent,  since  it  affects  the  model  selection  and  only 
indirectly  the  evolution  of  the  curve. 

The  first  steps  of  the  optimization  are  performed  with  A  = 
0,  until  stationarity,  then  Esm  is  added  with  A  7^  0  (manually 
obtained)  until  a  new  stationary  point  is  reached  (a  similar 
idea  is  considered  in  [6]). 


4.  EXPERIMENTAL  RESULTS 

The  first  example  is  a  “toy  example”  with  two  models  of  el¬ 
lipses,  where  the  only  difference  is  that  the  first  (and  only) 
eigenmode  is  rotated  |  (this  already  exemplifies  the  impor¬ 
tance  of  high-order  models).  Let  us  name  M\  the  model  with 
vertical  deformations  and  Ml  the  one  with  horizontal  ones. 
Figure  2(a)  shows  the  mode  of  variation  for  both  models.  The 
input  image  contains  an  occluded  vertical  ellipse,  not  present 
in  the  training  set.  Two  different  experiments  are  presented, 
varying  the  order  d  of  the  model  M^.  With  d  =  0,  only 
the  mean  shape  is  considered  in  the  shape  prior  (no  deforma¬ 
tions),  with  d  =  1  the  vertical  deformations  are  considered. 
All  the  parameters  are  the  same  in  both  experiments.  Fig¬ 
ures  2(b)  and  2(c)  show  some  steps  in  the  minimization  and 
the  evolution  of  the  shape  dissimilarity  measure,  for  both  ex¬ 
periments,  respectively.  Steps  ©  and  ©  show  an  intermediate 
curve  in  the  evolution  with  A  =  0,  and  the  projections,  Vl<p 
and  Vy(j),  to  both  models,  dashed  colored  lines.  The  initial 
curve  (in  yellow)  is  also  shown.  Note  that  V^cj)  has  no  verti¬ 
cal  deformations.  The  following  steps  (©,©  and  ©,©)  show 
the  evolution  after  adding  the  F^sm  term  (A  =  1.1),  and  the 
obtained  segmentation  (@  and  ®). 

In  the  first  experiment,  the  projections  to  both  models  end 
in  the  same  shape  (the  mean  shape),  refiected  in  the  graph  of 
dissimilarity  measure  by  the  overlapping  of  the  green  and  red 
curves.  In  the  second  experiment.  Ml  captures  the  variation 
of  the  input  shape,  as  refiected  in  the  obtained  segmentation. 
In  this  case  there  is  also  a  model  switching  around  step  200 
(step  ®),  where  the  Ml  selected  while  the  occlusions  are 
being  filled.  After  this  point,  the  vertical  deformation  deter¬ 
mines  the  selection  of  Ml  for  the  rest  of  the  evolution,  ending 
with  an  accurate  segmentation. 

For  the  next  experiments,  five  models  of  a  walking  per¬ 
son  cycle,  ^  [1,  5],  (i  =  21),  shown  in  Figure  1,  are 

used.  This  set  of  models  is  particularly  challenging  since  they 
are  different  deformations  of  the  “same  object.”  Five  new  oc- 


Fig.  3.  Segmentation  with  walking  model.  See  text  for  details. 


eluded  shapes  0/.,  each  one  belonging  to  a  different  model  and 
not  in  are  segmented,  Figure  3.  In  each  case  the  correct 
model  is  selected  and  the  segmented  shape  is  adjusted  to  the 
gray  levels,  when  present,  and  correctly  completes  the  occlu¬ 
sions  when  the  image  information  is  missing.  Figure  3(a)  (in 
all  the  cases,  the  segmentations  were  validated  with  the  pro¬ 
posed  measure).  Figure  3(b)  plots  the  evolution  of  the  shape 
dissimilarity  measure  for  02  •  Note  that  when  the  shape  prior 
is  added  (A  =  1.02),  Tw2(0)  decreases  faster  than  the  others. 
The  abrupt  decay  around  step  120  corresponds  to  the  filling 
of  the  main  occlusion.  Figure  3(c)  shows  02  with  the  curve 
at  the  step  when  the  shape  prior  is  added,  and  the  projections 
of  0  (blue  curve)  to  the  five  models,  coded  with  colors.  Fi¬ 
nally,  Figure  3(d)  shows  the  segmentation  obtained  with  our 
framework  for  gray  level  images.  In  this  case,  the  automati¬ 
cally  selected  models,  as  well  as  the  obtained  segmentations, 
are  correct  and  accurate. 

Additional  results,  including  the  automatic  classification 
and  segmentation  of  multiple  objects,  generated  from  a  stan¬ 
dard  dataset  of  fish,  in  noisy  images,  are  presented  at  the  con¬ 
ference. 

5.  CONCLUDING  REMARKS 

A  framework  for  simultaneous  and  automatic  model  selection 
and  object  segmentation  was  introduced  in  this  paper.  The 
proposed  technique  is  based  on  a  new  energy  that  combines 
region  based  segmentation  with  on-line  selection  of  the  best 
model  for  the  object  present  in  the  image. 

Possible  directions  for  further  improvements  include 
incorporating  high-order  modes  in  the  validation  step  and 
considering  going  beyond  PCA,  as  well  as  including  class- 
dependent  model  orders  (dk).  Results  in  these  directions  will 
be  reported  elsewhere. 
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